Systems and Methods for Improved Adversarial Training of Machine-Learned Models

ABSTRACT

Example aspects of the present disclosure are directed to systems and methods that enable improved adversarial training of machine-learned models. An adversarial training system can generate improved adversarial training examples by optimizing or otherwise tuning one or hyperparameters that guide the process of generating of the adversarial examples. The adversarial training system can determine, solicit, or otherwise obtain a realism score for an adversarial example generated by the system. The realism score can indicate whether the adversarial example appears realistic. The adversarial training system can adjust or otherwise tune the hyperparameters to produce improved adversarial examples (e.g., adversarial examples that are still high-quality and effective while also appearing more realistic). Through creation and use of such improved adversarial examples, a machine-learned model can be trained to be more robust against (e.g., less susceptible to) various adversarial techniques, thereby improving model, device, network, and user security and privacy.

FIELD

The present disclosure relates generally to machine learning. Moreparticularly, the present disclosure relates to systems and methods thatenable improved adversarial training by tuning one or morehyperparameters that guide the generation of adversarial trainingexamples.

BACKGROUND

Adversarial machine learning lies at the intersection of machinelearning and computer security. In particular, malicious actors canperform a number of adversarial techniques that are aimed at foolingmachine-learned models by maliciously crafting samples that are notperceived as being different by humans, but in fact reliably fool themodel into providing an incorrect output. As one example, an adversarialinput may appear to a human observer as a verbal request fornavigational instructions but, due to its maliciously crafted nature,will fool a machine-learned model into inferring that the user hasrequested a transfer of money to a certain account and/or has requesteda passcode or passphrase for a system security check.

Thus, some adversarial techniques can use inputs to machine-learnedmodels that an attacker has intentionally designed to cause the model tomake a mistake. As such, training machine-learned models to be robustagainst (i.e., to not be fooled by) adversarial techniques is importantfor improving model, device, network, and user security and privacy. Asmachine-learned models become more pervasive across all products andcomputerized decision making, the ability of machine-learned models towithstand adversarial attacks will become of vital importance.

One aspect of adversarial training includes generating adversarialtraining examples and then training the machine-learned model using thegenerated adversarial training examples as additional training examples.In particular, in one example, an adversarial example can be createdthat the computer misrecognizes but that a human clearly recognizescorrectly. This adversarial example can be used as a “positive” trainingexample for the class that the human assigns to it. In such fashion,machine-learned models can be trained to be more robust againstadversarial inputs.

The process of generating adversarial training examples is generallyguided by one or more hyperparameters that control aspects of thegeneration process. However, these hyperparameters are highly sensitiveto the format of the input data, the nature of the problem being solved,and other facets of the problem/learning structure. Thus, generatingadversarial examples that are both effective while also remainingrealistic can be a technically challenging, resource-intensive andtime-consuming process. In particular, adversarial training examplesthat are generated need to be realistic, since unrealistic trainingexamples may lead to another type of misinterpretation.

SUMMARY

Aspects and advantages of embodiments of the present disclosure will beset forth in part in the following description, or can be learned fromthe description, or can be learned through practice of the embodiments.

One example aspect of the present disclosure is directed to acomputer-implemented method. The method includes obtaining, by one ormore computing devices, a training example for a machine-learned model.The method includes generating, by the one or more computing devices, anadversarial example from the training example according to one or morehyperparameters. The method includes determining, by the one or morecomputing devices, a realism score for the adversarial example thatindicates whether the adversarial example appears realistic. The methodincludes adjusting, by the one or more computing devices, at least oneof the one or more hyperparameters based at least in part on the realismscore for the adversarial example.

Another example aspect of the present disclosure is directed to acomputer-implemented method. The method includes obtaining, by one ormore computing devices, a training example for a machine-learned model.The method includes generating, by the one or more computing devices, anadversarial example from the training example according to one or morehyperparameters. The method includes generating, by the one or morecomputing devices, a score for the adversarial example. The scorerepresents a position of the adversarial example relative to a positionof the training example in an input data space for the model. The methodincludes adjusting, by the one or more computing devices, at least oneof the one or more hyperparameters based at least in part on the scorefor the adversarial example. The method includes generating, by the oneor more computing devices, an additional adversarial example accordingto the adjusted one or more hyperparameters. The method includestraining, by the one or more computing devices, the machine-learnedmodel based at least in part on the additional adversarial example.

Another example aspect of the present disclosure is directed to a mobilecomputing device. The mobile computing device includes an application.The application includes a machine-learned model. The mobile computingdevice includes one or more processors and an on-device adversarialtraining platform implemented by the one or more processors. Theon-device adversarial training platform is configured to performoperations. The operations include obtaining a training example for themachine-learned model. The operations include generating an adversarialexample from the training example according to one or morehyperparameters. The operations include providing the adversarialexample to the application via an application programming interface. Theoperations include receiving a realism score for the adversarial examplefrom the application via the application programming interface. Therealism score indicates whether the adversarial example appearsrealistic. The operations include adjusting at least one of the one ormore hyperparameters based at least in part on the realism score for theadversarial example received from the application via the applicationprogramming interface.

Other aspects of the present disclosure are directed to various systems,apparatuses, non-transitory computer-readable media, user interfaces,and electronic devices.

These and other features, aspects, and advantages of various embodimentsof the present disclosure will become better understood with referenceto the following description and appended claims. The accompanyingdrawings, which are incorporated in and constitute a part of thisspecification, illustrate example embodiments of the present disclosureand, together with the description, serve to explain the relatedprinciples.

BRIEF DESCRIPTION OF THE DRAWINGS

Detailed discussion of embodiments directed to one of ordinary skill inthe art is set forth in the specification, which makes reference to theappended figures, in which:

FIG. 1 depicts a block diagram of an example computing device accordingto example embodiments of the present disclosure.

FIG. 2 depicts a graphical diagram of an example process of generatingadversarial examples for machine-learned models according to exampleembodiments of the present disclosure.

FIG. 3 depicts a graphical diagram of example techniques to trainmachine-learned models according to example embodiments of the presentdisclosure.

FIG. 4 depicts a flowchart diagram of an example method to enableadversarial training of machine-learned models according to exampleembodiments of the present disclosure.

Reference numerals that are repeated across plural figures are intendedto identify the same features in various implementations.

DETAILED DESCRIPTION Overview

Example aspects of the present disclosure are directed to systems andmethods that enable improved adversarial training of machine-learnedmodels. In particular, the present disclosure provides an adversarialtraining system that generates improved adversarial training examples byoptimizing or otherwise tuning one or hyperparameters that guide theprocess of generating of the adversarial examples. In particular, theadversarial training system can determine, solicit, or otherwise obtaina realism score for an adversarial example generated by the system. Therealism score can indicate whether the adversarial example appearsrealistic. The adversarial training system can adjust or otherwise tunethe hyperparameters to produce improved adversarial examples (e.g.,adversarial examples that are still high-quality and effective whilealso appearing more realistic). Through creation and use of suchimproved adversarial examples, a machine-learned model can be trained tobe more robust against (e.g., less susceptible to) various adversarialtechniques, thereby improving model, device, network, and user securityand privacy.

According to one aspect of the present disclosure, the adversarialtraining system can communicate or otherwise cooperatively operate withone or more distinct components or systems (e.g., an application thatincludes a machine-learned model) by way of one or more applicationprogramming interfaces (APIs). For example, the adversarial trainingsystem can adversarially train a machine-learned model by passing databack and forth with a component (e.g., application) that is responsiblefor and/or runs the machine-learned model.

Thus, a machine learning API provided by the present disclosure canallow the adversarial training system to support adversarial trainingfor any model. In addition, the adversarial training system can providea set of utilities through which developers can make use of this API tofine-tune the hyperparameters that guide generation of adversarialexamples for their models. In particular, in one particular example, theadversarial training system can provide a newly generated adversarialexample to an application via an API. The application can assess therealism of the adversarial example to generate a realism score and thenthe application can provide the realism score to the adversarialtraining system via the API. As examples, the application can assess therealism score of the adversarial example according to some heuristic orscoring function and/or by providing (e.g., displaying) the example to auser and seeking user feedback regarding whether the example appearsrealistic. Thus, in some implementations, the adversarial trainingsystem uses or otherwise leverages application-specific user feedback orprogrammatic heuristics to improve adversarial training of thecorresponding machine-learned model, without the need for the developersof the application to specifically understand how the adversarialtraining system works internally.

More particularly, a computing system can implement an adversarialtraining system. The computing system can include one or more computingdevices. As one example, in some implementations, a user computingdevice (e.g., a smartphone, tablet, laptop, etc.) can locally implementthe adversarial training system as an on-device platform. For example,the user computing device can operate to provide the adversarialtraining system as a service (e.g., via one or more APIs) tomachine-learned models locally stored on the device (e.g., included inone or more applications installed on and executed by the device). Asanother example, in some implementations, a server computing system canimplement the adversarial training system as a service that isaccessible to devices over a network (e.g., via one or more APIs). Asyet another example, certain aspects of the adversarial training systemcan be performed on-device (e.g., by the user computing device) whileother aspects of the adversarial training system can be performed by theserver computing system.

According to an aspect of the present disclosure, the adversarialtraining system can obtain a training example for a machine-learnedmodel. The training example can be a training example intended for usein training the machine-learned model. For example, the training examplecan be a positive training example for a certain model output. Thetraining example can be part of an initial training batch or can be partof a new training batch for model update or re-training.

As one example, in some implementations, the training example can be apersonal training example that is stored at a local memory of the usercomputing device. The machine-learned model can also be stored at thelocal memory of the user computing device. Thus, in someimplementations, the adversarial training system can be performedon-device to perform personalized adversarial training that is seededwith personal training examples. This personalized adversarial trainingcan be combined with other on-device training techniques such as, forexample, personalized learning frameworks and/or federated learningframeworks, in which model updates are computed locally and thencommunicated to a centralized system for aggregation to determine aglobal update.

According to another aspect of the present disclosure, the adversarialtraining system can generate an adversarial example from the trainingexample according to one or more hyperparameters. In someimplementations, the adversarial training system can generate theadversarial example by determining a direction of a gradient of a lossfunction that evaluates an output provided by the machine-learned modelwhen given at least a portion of the training example as an input. Theadversarial training system can perturb the training example in a seconddirection that is based on (e.g., opposite to) the direction of thegradient of the loss function to generate the adversarial example. Asone example, the adversarial training system can treat the input data asoptimizable parameters and backpropagate the loss function all the waythrough the model and further through the input data to modify the inputdata (e.g., in the second direction). In some implementations, the modelparameters can be fixed during such backpropagation. Other techniquesfor generating adversarial examples can be used as well in addition oralternatively to the opposite gradient direction technique describedabove.

More particularly, adversarial training allows for every batch oftraining examples to generate several adversarial versions of them by,for example, perturbing the training example intentionally in theopposite direction than the one in which the model is moving to bytraining on that sample. In some implementations, this can mean taking astep in a direction other than the gradient for the training example.

According to an aspect of the present disclosure, the adversarialtraining system can generate the adversarial training example accordingto or in accordance with one or more hyperparameters. Thehyperparameters can be configurable parameters of the generationprocess.

As one example, the one or more hyperparameters can include a step sizehyperparameter that controls a magnitude of a step in the seconddirection performed when perturbing the training example (e.g.,according to the backpropagation technique described above). In someinstances, the step size hyperparameter can be referred to or otherwiserepresented by an epsilon.

In particular, to make sure the resulting adversarial examples arerealistic, it is useful to generally know how far the generation processis allowed to move in the input space. Larger steps can generate moreeffective or meaningful adversarial examples. However, step sizes thatare too large can result in adversarial examples that do not appearrealistic and therefore do not assist in combating adversarialtechniques that rely on realistic, but malicious input data.

As another example, the one or more hyperparameters can include a normhyperparameter that controls a norm applied to the gradient prior tosaid perturbing. For example, the norm hyperparameter can controlwhether the norm of the gradient is taken when determining the seconddirection. Further, if the norm is to be taken, the norm hyperparametercan control which norm is applied. Example norms that can be appliedinclude the infinity norm, the L2 norm, or other norms.

As yet another example, the one or more hyperparameters can include aloss hyperparameter that controls the loss function for which thegradient is determined. As examples, example losses that can be used asthe loss function include a cross-entropy loss, a cost of loss, or otherloss functions. Furthermore, in some implementations, an additionalperturbation function can be applied on top of the gradient and can becontrolled by the loss hyperparameter and/or other hyperparameters.

As another example, the one or more hyperparameters can include whetherto perform iterative perturbations to the sample. As further example, inthe event iterative perturbations are to be performed, an additionalexample hyperparameter can control how many iterations should beperformed or how to modify various settings, controls, etc. between eachiteration.

According to another aspect of the present disclosure, the adversarialtraining system can determine a realism score for the adversarialexample that indicates whether the adversarial example appearsrealistic. For example, the realism score for the adversarial examplecan indicate whether the adversarial example appears realistic to ahuman observer.

In some implementations, to determine the realism score, the adversarialtraining system can provide the adversarial example to an applicationvia an application programming interface. The application can generate arealism score for the adversarial example. The adversarial trainingsystem can receive the realism score for the adversarial example fromthe application via the application programming interface.

In some implementations, the computing system (e.g., the applicationand/or the adversarial training system) can determine the realism scorefor the adversarial example by inputting the adversarial example into ascoring function that heuristically evaluates the adversarial example.

As one example, the scoring function can determine whether theadversarial example still matches an input data space. For example, ifthe adversarial example exceeds an input boundary or otherwise does notconform to the acceptable input data space, the scoring function canscore the adversarial example as being less realistic. In contrast, ifthe adversarial example matches the input data space, the scoringfunction can score the adversarial example as being more realistic. Asone example of this concept applied to natural language processinginputs, the scoring function can determine whether the adversarialexample includes nonsense words or undefined words and, if so, theadversarial example can be viewed as failing to match the input dataspace.

In some implementations, the scoring function and/or other component canprovide and/or perform a corrective action that would allow theadversarial example to become compliant with the input data space.

As another example, for adversarial examples that include imagery, thescoring function can determine an L2 distance between pixel values ofthe imagery. The scoring function can provide a realism score based atleast in part on the L2 distance(s). Many other and different scoringfunctions can be used to assess the realism of input data of other anddifferent types.

In some implementations, in addition or alternatively to applying ascoring function, the computing system (e.g., the application and/or theadversarial training system) can determine the realism score for theadversarial example by providing the adversarial example for display toa human user and receiving feedback from the human user that indicateswhether the adversarial example appears realistic.

In some implementations, the adversarial training system can determinewhether to keep (e.g., for use in adversarially training themachine-learned model) or discard the adversarial example based at leastin part on the realism score. As one example, when the realism score isgreater than a threshold score, the adversarial training system canstore the adversarial example for use in training the machine-learnedmodel. In contrast, when the realism score is less than the thresholdscore, the adversarial training system can discard the adversarialexample. In some implementations, the threshold score can be auser-configurable variable.

According to another aspect of the present disclosure, the adversarialtraining system can adjust at least one of the one or morehyperparameters based at least in part on the realism score for theadversarial example. For example, the adversarial training system cantweak the hyperparameters in a way which will provide more realisticadversarial samples. For example, this may include reducing the stepsize hyperparameter, changing the norm hyperparameter, changing the losshyperparameter, and/or increasing the iteration count hyperparameter.

After adjusting the hyperparameter(s), the adversarial training systemcan generate an additional adversarial example according to the adjustedhyperparameters. The adversarial training system or other trainingcomponent can train the machine-learned model based at least in part onthe additional adversarial example. For example, the additionaladversarial example can be designated as a positive training example fora class that would be recognized by a human observer.

According to another aspect of the present disclosure, the adversarialtraining system can iteratively generate and evaluate (e.g., determine arealism score for) adversarial training examples. For example, theadversarial training system can iteratively generate and evaluateadversarial training examples until the generated adversarial example(s)satisfy one or more criteria. For example, the adversarial trainingsystem can iteratively adjust one or more hyperparameters until a mostrecent realism score exceeds a threshold score; until a moving averageof realism scores exceeds the threshold score; and/or until other one ormore other criteria are met. In some implementations, the thresholdscore can be a user-configurable variable. In such fashion, theadversarial training system can iteratively generate and evaluateadversarial training examples until a desired balance between realismand significance is reached.

As one example, the adversarial training system can iteratively reduce astep size hyperparameter that controls a magnitude of a step performedwhen generating the adversarial example. For example, the adversarialtraining system can iteratively reduce the step size hyperparameteruntil a most recent realism score exceeds a threshold score; until amoving average of realism scores exceeds the threshold score; and/oruntil one or more other criteria are met.

It will be appreciated that, in these implementations, the generation ofeffective, application-specific adversarial examples allows the model tobe effectively trained against malicious adversarial attacks and therebyincreases the security of the system against such attacks.

A number of different features can be built on top of the adversarialtraining system, and their characteristics can depend on theapplication. As one example application, a recommendation system can betrained on-device for personalization. The recommendation system cantrain on all the samples the user generates and in this process mightask the user follow-up questions that would in fact be the adversarialsamples generated by the adversarial training system. For example, ifthe user selects to watch a superhero movie on a streaming service, thenthrough the adversarial training one hard/close sample would be whetherthe user likes superhero content in general, and this could be renderedas a question explicitly. Alternatively, the system could just recommendthis hard sample as one of the next videos, and use it as negative ifthe user dismisses the recommendation.

In another example application, an optical character recognitionapplication could make use of the adversarial training system by showingthe user a perturbed version of a photograph that was uploaded forrecognition. The application could ask the user whether the perturbedphotograph is still understandable (e.g., readable).

According to another aspect, in some implementations, without making useof the gradients, the adversarial training system can also be built byexploring what the models results are for all the neighbors of a sample(which in some instances is referred to as a “saliency map”). As anotherexample, in some implementations, the system can make use of generativeadversarial networks to have a neural network trained jointly instead ofusing more generic scoring heuristics.

The systems and methods of the present disclosure provide a number oftechnical effects and benefits. As one example technical effect andbenefit, the systems and methods of the present disclosure can trainmachine-learned models more robustly against adversarial attacks. Thiscan improve model, device, network, and user security and privacy. Forexample, as outlined above, the model may be less susceptible toattempts by a fraudulent party to gain access to, or otherwisefraudulently instruct, a computing system using an adversarial attack.

As another example technical effect and benefit, the systems and methodsof the present disclosure can improve the overall quality of the modelby enabling the model to make better predictions. Thus, the performanceof the model itself and, therefore, the performance of the system whichrelies upon the model's inferences can be improved.

As another example technical effect and benefit, the systems and methodsof the present disclosure can provide a centralized adversarial trainingservice so that applications do not need to each include the fullsystems and capability to perform adversarial training. As such, a givenapplication is not required to adversarially train machine-learnedmodel(s) but can instead simply communicate with the adversarialtraining system to adversarially train the model(s). This can enable thedata size of applications to be smaller. It can also simplify thedevelopment and deployment of applications or other clients asapplication developers are not required to learn the intricacies ofadversarial training but can instead simply rely upon usage of thesystem APIs.

As yet another example technical effect and benefit, in implementationsin which the adversarial training system is implemented as an on-deviceplatform, the systems and methods of the present disclosure can improvecommunication network efficiency and usage. That is, under pastparadigms where adversarial training is performed by a server ratherthan on-device, various types of information (e.g., input data, trainingexamples, inferences, model parameters, etc.) were required to betransmitted by the server to the device over a communications network(e.g., the Internet). However, since the present disclosure enableson-device adversarial training and/or other machine learning tasks orfunctionality, such information is not required to be transmitted (atleast in every instance) over a communications network. Therefore,communications network traffic, efficiency, and usage are improved. Inaddition, since the input data, training examples etc. is not beingtransmitted to and from a server, the security of the data may beincreased.

Example Devices and Systems

FIG. 1 depicts a block diagram of an example computing device 102 thatincludes an adversarial training system 122 according to exampleembodiments of the present disclosure. Device 102 is provided as oneexample only. Many different devices and systems can be used toimplement aspects of the present disclosure.

The computing device 102 can be any type of computing device including,for example, a desktop, a laptop, a tablet computing device, asmartphone, a computing device that is able to be worn, a gamingconsole, an embedding computing device, or other forms of computingdevices. Thus, in some implementations, the computing device 102 can bea mobile computing device and/or a user computing device.

The computing device 102 includes one or more processors 112 and amemory 114. The one or more processors 112 can be any suitableprocessing device (e.g., a processor core, a microprocessor, an ASIC, aFPGA, a controller, a microcontroller, etc.) and can be one processor ora plurality of processors that are operatively connected. The memory 114can include one or more non-transitory computer-readable storagemediums, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magneticdisks, etc., and combinations thereof. The memory 114 can store data andinstructions which are executed by the processor 112 to cause thecomputing device 102 to perform operations. The computing device 102 canalso include a network interface 116 that enables communications overone or more networks (e.g., the Internet).

The computing device 102 can store or otherwise include one or moreapplications 120 a-c (e.g., mobile applications). One or more of theapplications 120 a-c may have one or more machine-learned models thatthe applications want to adversarially train. For example, theapplication 120 a can include a first machine-learned model 132 a and afirst training example cache 134 a. Likewise, the application 120 b canhave a machine-learned model 132 b and a training example cache 134 bwhile the application 120 c can have a machine-learned model 132 c and atraining example cache 134 c. Some applications can include multiplemachine learned models. However, some applications may not havemachine-learned models. In some implementations, the adversarialtraining system can also include its own training example cache 124.

In some implementations, one or more of the applications 120 a-c canfurther include a respective machine learning library. The machinelearning libraries can include one or more machine learning engines(e.g., a TensorFlow engine), a neural network library, and/or othercomponents that enable implementation of machine-learned models 132 a-cfor inference and/or training. In other implementations, the machinelearning libraries can be stored at and/or implemented by theadversarial training system 122 and provided as a service to theapplications 120 a-c by the adversarial training system 122.

The adversarial training system 122 enables improved adversarialtraining of the machine-learned models 132 a-c. In particular, theadversarial training system 122 can generate improved adversarialtraining examples by optimizing or otherwise tuning one orhyperparameters that guide an adversarial example generator 126 thatgenerates the adversarial examples.

In particular, the adversarial training system 122 can determine,solicit, or otherwise obtain a realism score for an adversarial examplegenerated by the system 122. As one example, a feedback manager 128 cancommunicate with the applications 120 a-c (e.g., via an API) to obtainfeedback in the form of a realism score.

The realism score can indicate whether the adversarial example appearsrealistic. The adversarial training system 122 can adjust or otherwisetune the hyperparameters to produce improved adversarial examples (e.g.,adversarial examples that are still high-quality and effective whilealso appearing more realistic). Through creation and use of suchimproved adversarial examples, a machine-learned model can be trained tobe more robust against (e.g., less susceptible to) various adversarialtechniques, thereby improving model, device, network, and user securityand privacy.

According to one aspect of the present disclosure, the adversarialtraining system 122 can communicate or otherwise cooperatively operatewith one or more distinct components or systems (e.g., application 120 athat includes machine-learned model 132 a) by way of one or moreapplication programming interfaces (APIs). For example, the adversarialtraining system 122 can adversarially train the machine-learned model132 a by passing data back and forth with the application 120 a that isresponsible for and/or runs the machine-learned model 132 a.

Thus, a machine learning API can allow the adversarial training system122 to support adversarial training for any model. In addition, theadversarial training system 122 can provide a set of utilities throughwhich developers can make use of this API to fine-tune thehyperparameters that guide generation of adversarial examples for theirmodels.

In particular, in one particular example, the feedback manager 128 canprovide a newly generated adversarial example to application 120 a viaan API. The application 120 a can assess the realism of the adversarialexample to generate a realism score and then the application 120 a canprovide the realism score to the feedback manager 128 via the API. Asexamples, the application 120 a can assess the realism score of theadversarial example according to some heuristic or scoring functionand/or by providing (e.g., displaying) the example to a user and seekinguser feedback regarding whether the example appears realistic. Thus, insome implementations, the adversarial training system 122 uses orotherwise leverages application-specific user feedback or programmaticheuristics to improve adversarial training of the correspondingmachine-learned model, without the need for the developers of theapplication to specifically understand how the adversarial trainingsystem works internally.

According to an aspect of the present disclosure, the adversarialtraining system 122 can obtain a training example for a machine-learnedmodel. The training example can be a training example intended for usein training the machine-learned model. For example, the training examplecan be a positive training example for a certain model output. Thetraining example can be part of an initial training batch or can be partof a new training batch for model update or re-training.

As one example, in some implementations, the training example can be apersonal training example that is stored at a local memory of the usercomputing device 102. The machine-learned model can also be stored atthe local memory of the user computing device. Thus, in someimplementations, the adversarial training system can be performedon-device to perform personalized adversarial training that is seededwith personal training examples. This personalized adversarial trainingcan be combined with other on-device training techniques such as, forexample, personalized learning frameworks and/or federated learningframeworks, in which model updates are computed locally and thencommunicated to a centralized system for aggregation to determine aglobal update.

In one example, the adversarial training system 122 can obtain thetraining example from a training example cache 124 maintained by theadversarial training system 122. For example, these training examplescan be generic or cross-application training examples. In anotherexample, the adversarial training system 122 can obtain the trainingexample from a training example cache (e.g., cache 134 a) stored orotherwise maintained by an application (e.g., application 120 a). Forexample, the application can pass the training example to theadversarial training system 122 using an API.

According to another aspect of the present disclosure, the adversarialexample generator 126 of the adversarial training system 122 cangenerate an adversarial example from the training example according toone or more hyperparameters. In some implementations, the adversarialexample generator 126 can generate the adversarial example bydetermining a direction of a gradient of a loss function that evaluatesan output provided by the machine-learned model when given at least aportion of the training example as an input. The adversarial examplegenerator 126 can perturb the training example in a second directionthat is based on (e.g., opposite to) the direction of the gradient ofthe loss function to generate the adversarial example.

As one example, the adversarial example generator 126 can treat theinput data as optimizable parameters and backpropagate the loss functionall the way through the model and further through the input data tomodify the input data (e.g., in the second direction). In someimplementations, the model parameters can be fixed during suchbackpropagation while the input data (e.g., the original trainingexample) is treated as optimizable. Other techniques for generatingadversarial examples can be used as well in addition or alternatively tothe opposite gradient direction technique described above.

More particularly, adversarial training allows for every batch oftraining examples to generate several adversarial versions of them by,for example, perturbing the training example intentionally in theopposite direction than the one in which the model is moving to bytraining on that sample. In some implementations, this can mean taking astep in a direction other than the gradient for the training example.

According to an aspect of the present disclosure, the adversarialexample generator 126 can generate the adversarial training exampleaccording to or in accordance with one or more hyperparameters. Thehyperparameters can be configurable parameters of the generationprocess.

As one example, the one or more hyperparameters can include a step sizehyperparameter that controls a magnitude of a step in the seconddirection performed when perturbing the training example (e.g.,according to the backpropagation technique described above). In someinstances, the step size hyperparameter can be referred to or otherwiserepresented by an epsilon.

In particular, to make sure the resulting adversarial examples arerealistic, it is useful to generally know how far the generation processis allowed to move in the input space. Larger steps can generate moreeffective or meaningful adversarial examples. However, step sizes thatare too large can result in adversarial examples that do not appearrealistic and therefore do not assist in combating adversarialtechniques that rely on realistic, but malicious input data.

As another example, the one or more hyperparameters can include a normhyperparameter that controls a norm applied to the gradient prior tosaid perturbing. For example, the norm hyperparameter can controlwhether the norm of the gradient is taken when determining the seconddirection. Further, if the norm is to be taken, the norm hyperparametercan control which norm is applied. Example norms that can be appliedinclude the infinity norm, the L2 norm, or other norms.

As yet another example, the one or more hyperparameters can include aloss hyperparameter that controls the loss function for which thegradient is determined. As examples, example losses that can be used asthe loss function include a cross-entropy loss, a cost of loss, or otherloss functions. Furthermore, in some implementations, an additionalperturbation function can be applied on top of the gradient (e.g., as anadditional layer between the model and the input data duringbackpropagation) and can be controlled by the loss hyperparameter and/orother hyperparameters.

As another example, the one or more hyperparameters can include whetherto perform iterative perturbations to the sample. As further example, inthe event iterative perturbations are to be performed, an additionalexample hyperparameter can control how many iterations should beperformed or how to modify various settings, controls, etc. between eachiteration.

According to another aspect of the present disclosure, the feedbackmanager 128 of the adversarial training system 122 can determine arealism score for the adversarial example that indicates whether theadversarial example appears realistic. For example, the realism scorefor the adversarial example can indicate whether the adversarial exampleappears realistic to a human observer.

In some implementations, to determine the realism score, the feedbackmanager 128 can provide the adversarial example to an application via anAPI. The application can generate a realism score for the adversarialexample. The feedback manager 128 can receive the realism score for theadversarial example from the application via the application programminginterface.

In some implementations, the computing system (e.g., the applicationand/or the feedback manager 128) can determine the realism score for theadversarial example by inputting the adversarial example into a scoringfunction that heuristically evaluates the adversarial example.

As one example, the scoring function can determine whether theadversarial example still matches an input data space. For example, ifthe adversarial example exceeds an input boundary or otherwise does notconform to the acceptable input data space, the scoring function canscore the adversarial example as being less realistic. In contrast, ifthe adversarial example matches the input data space, the scoringfunction can score the adversarial example as being more realistic. Asone example of this concept applied to natural language processinginputs, the scoring function can determine whether the adversarialexample includes nonsense words or undefined words and, if so, theadversarial example can be viewed as failing to match the input dataspace.

In some implementations, the scoring function and/or other component canprovide and/or perform a corrective action that would allow theadversarial example to become compliant with the input data space.

As another example, for adversarial examples that include imagery, thescoring function can determine an L2 distance between pixel values ofthe imagery. The scoring function can provide a realism score based atleast in part on the L2 distance(s). Many other and different scoringfunctions can be used to assess the realism of input data of other anddifferent types.

In some implementations, in addition or alternatively to applying ascoring function, the computing system (e.g., the application and/or thefeedback manager 128) can determine the realism score for theadversarial example by providing the adversarial example for display toa human user and receiving feedback from the human user that indicateswhether the adversarial example appears realistic.

In some implementations, the feedback manager 128 can determine whetherto keep (e.g., for use in adversarially training the machine-learnedmodel) or discard the adversarial example based at least in part on therealism score. As one example, when the realism score is greater than athreshold score, the feedback manager 128 can store the adversarialexample for use in training the machine-learned model. As one example,the adversarial example can be stored in a local memory of the device102. For example, the adversarial example can be stored in the trainingexample cache 124 of the adversarial training system 122. Alternativelyor additionally, the adversarial example can be stored in a trainingexample cache (e.g., cache 134 a) of the corresponding application(e.g., application 120 a). In contrast, when the realism score is lessthan the threshold score, the feedback manager 128 can discard theadversarial example. In some implementations, the threshold score can bea user-configurable variable.

According to another aspect of the present disclosure, a hyperparametercontroller 130 of the adversarial training system 122 can adjust atleast one of the one or more hyperparameters based at least in part onthe realism score for the adversarial example. For example, thehyperparameter controller 130 can tweak the hyperparameters in a waywhich will provide more realistic adversarial samples. For example, thismay include reducing the step size hyperparameter, changing the normhyperparameter, changing the loss hyperparameter, and/or increasing theiteration count hyperparameter.

After adjusting the hyperparameter(s), the adversarial example generator126 can generate an additional adversarial example according to theadjusted hyperparameters. A model trainer 131 of the adversarialtraining system 122 or other training component can train themachine-learned model based at least in part on the additionaladversarial example. For example, the additional adversarial example canbe designated as a positive training example for a class assigned to theadversarial example by a human observer.

The model trainer 131 can use various training or learning techniques,such as, for example, backwards propagation of errors. In someimplementations, performing backwards propagation of errors can includeperforming truncated backpropagation through time. The model trainer 131can perform a number of generalization techniques (e.g., weight decays,dropouts, etc.) to improve the generalization capability of the modelsbeing trained.

According to another aspect of the present disclosure, the adversarialtraining system 122 can iteratively generate and evaluate (e.g.,determine a realism score for) adversarial training examples. Forexample, the adversarial training system 122 can iteratively generateand evaluate adversarial training examples until the generatedadversarial example(s) satisfy one or more criteria. For example, theadversarial training system 122 can iteratively adjust one or morehyperparameters until a most recent realism score exceeds a thresholdscore; until a moving average of realism scores exceeds the thresholdscore; and/or until other one or more other criteria are met. In someimplementations, the threshold score can be a user-configurablevariable. In such fashion, the adversarial training system 122 caniteratively generate and evaluate adversarial training examples until adesired balance between realism and significance is reached.

As one example, the hyperparameter controller 130 can iteratively reducea step size hyperparameter that controls a magnitude of a step performedwhen generating the adversarial example. For example, the hyperparametercontroller 130 can iteratively reduce the step size hyperparameter untila most recent realism score exceeds a threshold score; until a movingaverage of realism scores exceeds the threshold score; and/or until oneor more other criteria are met.

It will be appreciated that, in these implementations, the generation ofeffective, application-specific adversarial examples allows the model tobe effectively trained against malicious adversarial attacks and therebyincreases the security of the system against such attacks.

The adversarial training system 122 may be in the form of one or morecomputer programs stored locally on the computing device 102 (e.g., asmartphone or tablet), which are configured, when executed by the device102, to perform machine learning management operations which enableperformance of on-device machine learning functions on behalf of one ormore locally-stored applications 120 a-c or other local clients.

In some implementations, the adversarial training system 122 can beincluded in or implemented as an application, such as, for example, amobile application. As one example, in the context of the Androidoperating system, the on-device adversarial training system 122 can beincluded in an Android Package Kit (APK) that can be downloaded and/orupdated. In another example, the adversarial training system 122 can beincluded in or implemented as a portion of the operating system of thedevice 102, rather than as a standalone application.

Each of the adversarial example generator 126, feedback manager 128,hyperparameter controller 130, and model trainer 131 include computerlogic utilized to provide desired functionality. Each of the adversarialexample generator 126, feedback manager 128, hyperparameter controller130, and model trainer 131 can be implemented in hardware, firmware,and/or software controlling a general purpose processor. For example, insome implementations, each of the adversarial example generator 126,feedback manager 128, hyperparameter controller 130, and model trainer131 includes program files stored on a storage device, loaded into amemory and executed by one or more processors. In other implementations,each of the adversarial example generator 126, feedback manager 128,hyperparameter controller 130, and model trainer 131 includes one ormore sets of computer-executable instructions that are stored in atangible computer-readable storage medium such as RAM hard disk oroptical or magnetic media.

Thus, as illustrated in FIG. 1 , the computing device 102 can locallyimplement the adversarial training system 122 as an on-device platform.For example, the computing device 102 can operate to provide theadversarial training system 122 as a service (e.g., via one or moreAPIs) to machine-learned models 132 a-c locally stored on the device 102(e.g., included in one or more applications 120 a-c installed on andexecuted by the device 102).

However, in other implementations of the present disclosure, a servercomputing system can implement the adversarial training system 122 as aservice that is accessible to devices over a network (e.g., via one ormore APIs). As yet another example, certain aspects of the adversarialtraining system 122 can be performed on-device (e.g., by the computingdevice 102) while other aspects of the adversarial training system 122can be performed by the server computing system.

FIG. 2 depicts a graphical diagram of an example process of generatingadversarial examples for machine-learned models according to exampleembodiments of the present disclosure. FIG. 2 illustrates one exampledata flow. Other processes or data flows that differ from thatillustrated in FIG. 2 can be used to implement aspects of the presentdisclosure.

Referring to FIG. 2 , at stage 1, a training example is transferred fromthe training example cache 134 a to the adversarial example generator126 (e.g., via an API). The adversarial example generator 126 generatesan adversarial example based on the obtained training example.

At stage 2, the adversarial example generator 126 provides the generatedadversarial example to the application 120 a (e.g., via an API). Theapplication 120 a generates a realism score for the adversarial example.

At stage 3, the application 120 a provides the realism score to thefeedback manager 128 (e.g., via an API). If the feedback manager 128determines (e.g., based on the realism score) that the adversarialexample should be used to train the model, then at stage 4 the feedbackmanager 128 provides (e.g., via an API) the adversarial training examplefor storage in the training example cache 134 a (e.g., labelled as apositive training example for a same class or other output label as theoriginal training example obtained from the training example cache 134a).

Next, at stage 5, the adversarial training example is provided (e.g.,via an API) from the training example cache 134 a to the model trainer131 (e.g., along with a number of other training examples from the cache134 a).

At stage 6, the model trainer 131 cooperatively communicates withapplication 120 a (e.g., via an API) to train a model used byapplication 120 a based on the training example(s) received from thetraining example cache 134 a, including the adversarial trainingexample.

FIG. 3 depicts a graphical diagram of example personalization andfederated learning data flows according to example embodiments of thepresent disclosure.

More particularly, FIG. 3 depicts three different learning data flowswhich may in some instances be used in a complementary fashion. In afirst data flow, shown primarily in dash line at the bottom of FIG. 3 ,training data is generated on a user device. The training data isuploaded to a central authority which then trains or re-trains amachine-learned model based on the uploaded data. The model is then sentto the user device for use (e.g., on-device inference).

In a second data flow which can be referred to as personalization orpersonalized learning, the training data created on the user device isused to train or re-train the model on the device. The re-trained modelis then used by such device. This personalized learning enablesper-device models to be trained and evaluated without centralized datacollection, thereby enhancing data security and user privacy.

In a third data flow which can be referred to as federated learning, thetraining data created on the user device is used to train or re-trainthe model on the device. Thus, the actual user-specific training data isnot uploaded to the cloud, thereby enhancing data security and userprivacy.

However, after such on device learning, the user device can provide anupdate to a central authority. For example, the update can describe oneor more parameters of the re-trained model or one or more changes to theparameters of the model that occurred during the re-training of themodel.

The central authority can receive many of such updates from multipledevices and can aggregate the updates to generate an updated globalmodel. The updated global model can then be re-sent to the user device.This scheme enables cross-device models to be trained and evaluatedwithout centralized data collection.

Adversarial training examples generated according to aspects of thepresent disclosure can be included in the training stages which occur inany of these three data flows.

Example Methods

FIG. 4 depicts a flowchart diagram of an example method 400 to enableadversarial training of machine-learned models according to exampleembodiments of the present disclosure. Although FIG. 4 depicts stepsperformed in a particular order for purposes of illustration anddiscussion, the methods of the present disclosure are not limited to theparticularly illustrated order or arrangement. The various steps of themethod 400 can be omitted, rearranged, combined, and/or adapted invarious ways without deviating from the scope of the present disclosure.

At 402, a computing system can obtain a training example for amachine-learned model. As one example, obtaining the training example at402 can include obtaining a personal training example that is stored ata local memory of a computing device that performs the method 400. Forexample, the training example can be obtained from a training examplecache maintained by an application that also includes themachine-learned model. As another example, the training example can beobtained from a centralized training example cache stored on acentralized machine learning platform.

At 404, the computing system can generate an adversarial example fromthe training example. In particular, the computing system can generatethe adversarial example according to one or more hyperparameters.

In some implementations, generating the adversarial example from thetraining example at 404 can include determining a direction of agradient of a loss function that evaluates an output provided by themachine-learned model when given at least a portion of the trainingexample as an input. Generating the adversarial example from thetraining example at 404 can further include perturbing the trainingexample in a second direction that is opposite to the direction of thegradient of the loss function to generate the adversarial example.

As one example, one of the hyperparameters can include a step sizehyperparameter that controls a magnitude of a step in the seconddirection performed when perturbing the training example in the seconddirection. As another example, one of the hyperparameters can include anorm hyperparameter that controls a norm applied to the gradient priorto perturbing the training example. As yet another example, one of thehyperparameters can include a loss hyperparameter that controls the lossfunction for which the gradient is determined.

At 406, the computing system can determine a realism score for theadversarial example that indicates whether the adversarial exampleappears realistic. In particular, in some implementations, the realismscore for the adversarial example can indicate whether the adversarialexample appears realistic to a human observer. In another example, therealism score (or some other form of score generated for the adversarialexample) can represent a position of the adversarial example relative toa position of the training example in an input data space for the model.

In some implementations, determining the realism score for theadversarial example at 406 can include providing the adversarial exampleto an application via an application programming interface and receivingthe realism score for the adversarial example from the application viathe application programming interface. As yet another example,determining the realism score for the adversarial example at 406 caninclude providing the adversarial example for display to a human userand receiving feedback from the human user that indicates whether theadversarial example appears realistic. For example, the application canprovide the adversarial example for display to the user or a centralizedplatform can provide the adversarial example for display.

As another example, determining the realism score for the adversarialexample at 406 can include inputting the adversarial example into ascoring function that heuristically evaluates the adversarial example.As one example, the scoring function can heuristically evaluate one ormore properties of the adversarial example, relative to an input dataspace, to generate the score.

At 408, the computing system can adjust at least one of the one or morehyperparameters based at least in part on the realism score. As oneexample, the computing system can reduce a step size hyperparameter. Asother examples, the computing system can adjust or otherwise change aloss hyperparameter and/or a norm hyperparameter. In someimplementations, the computing system can adjust the one or morehyperparameters according to a binary search scheme.

At 410, the computing system can generate an additional adversarialexample according to the adjusted one or more hyperparameters. Forexample, the same generation process performed at 404 can be performedagain except according to the adjusted hyperparameters.

In some implementations, after 410, the method 400 can return to 406 anddetermine a new realism score for the new additional adversarial examplegenerated at 410. This is indicated in FIG. 4 by the dashed line. Thus,in some implementations, the computing system can iteratively performblocks 406, 408, and 410 for a plurality of iterations. This can enableiterative improvement in the adversarial example. As one example, thecomputing system can iteratively reduce a step size hyperparameter thatcontrols a magnitude of a step performed when generating the adversarialexample.

As one example, the computing system can iteratively perform blocks 406,408, and 410 until a most recent realism score exceeds a thresholdscore; until a running average of realism scores exceeds a thresholdvalue; until an iteration-over-iteration change in realism scores fallsbelow a threshold value, and/or until one or more other criteria aresatisfied.

After a final iteration of 410, method 400 proceeds to block 412. Insome implementations, the computing system can store eachiteratively-generated adversarial training example that received arealism score that is greater than a threshold value while discardingany iteratively-generated adversarial training examples that received arealism score less than the threshold value.

In some implementations, some or all of the threshold scores describedabove can be user-configurable to all the user to determine anappropriate balance between realism, step size, speed of performance,and/or other trade-offs.

At 412, the computing system can train the machine-learned model basedat least in part on the adversarial example and/or the additionaladversarial example.

Additional Disclosure

The technology discussed herein makes reference to servers, databases,software applications, and other computer-based systems, as well asactions taken and information sent to and from such systems. Theinherent flexibility of computer-based systems allows for a greatvariety of possible configurations, combinations, and divisions of tasksand functionality between and among components. For instance, processesdiscussed herein can be implemented using a single device or componentor multiple devices or components working in combination. Databases andapplications can be implemented on a single system or distributed acrossmultiple systems. Distributed components can operate sequentially or inparallel.

While the present subject matter has been described in detail withrespect to various specific example embodiments thereof, each example isprovided by way of explanation, not limitation of the disclosure. Thoseskilled in the art, upon attaining an understanding of the foregoing,can readily produce alterations to, variations of, and equivalents tosuch embodiments. Accordingly, the subject disclosure does not precludeinclusion of such modifications, variations and/or additions to thepresent subject matter as would be readily apparent to one of ordinaryskill in the art. For instance, features illustrated or described aspart of one embodiment can be used with another embodiment to yield astill further embodiment. Thus, it is intended that the presentdisclosure cover such alterations, variations, and equivalents.

1-20. (canceled)
 21. A computer-implemented method, the methodcomprising: perturbing, by one or more computing devices, image data togenerate adversarial image data configured to cause a machine-learnedmodel to misrecognize content depicted in the image data; providing, bythe one or more computing devices to a user device, the adversarialimage data for display on the user device; receiving, by the one or morecomputing devices and from the user device, user feedback indicatingrecognition by the user of the content; and training, by the one or morecomputing devices, the machine-learned model based at least in part onthe adversarial image data.
 22. The method of claim 21, wherein the userfeedback comprises a class assigned to the content by the user.
 23. Themethod of claim 21, wherein the content corresponds to a class, andwherein the user feedback indicates recognition of the class by theuser.
 24. The method of claim 21, wherein the user feedback indicatesthat the content is readable.
 25. The method of claim 21, whereinperturbing the image data comprises: determining, by the one or morecomputing devices, a direction of a gradient of a loss function thatevaluates an output provided by the machine-learned model when given atleast a portion of the image data as an input; perturbing, by the one ormore computing devices, the image data in a second direction that isopposite to the direction of the gradient of the loss function; andgenerating, by the one or more computing devices and based at least inpart on the perturbed image data, the adversarial image data.
 26. Themethod of claim 21, wherein the image data is perturbed according to oneor more updated perturbation parameters, wherein the one or more updatedperturbation parameters were obtained by: perturbing, by the one or morecomputing devices, first image data to generate first adversarial imagedata configured to cause the machine-learned model to misrecognize firstcontent depicted in the first image data, wherein the first image datais perturbed according to one or more first perturbation parameters;providing, by the one or more computing devices to a first user device,the first adversarial image data for display on the first user device;receiving, by the one or more computing devices and from the first userdevice, first user feedback indicating recognition failure by the firstof the first content; and updating, automatically by the one or morecomputing devices, the one or more first perturbation parameters toobtain the one or more updated perturbation parameters, the one or moreupdated perturbation parameters configured to decrease a magnitude ofthe perturbation.
 27. The method of claim 26, wherein the one or morefirst perturbation parameters comprise a step size hyperparameter thatcontrols a magnitude of a step performed during the perturbation. 28.The method of claim 27, wherein updating the one or more firstperturbation parameters comprises decreasing, automatically by the oneor more computing devices, the step size.
 29. A computing system,comprising: one or more processors; and one or more non-transitorycomputer-readable media storing instructions that are executable by theone or more processors to cause the computing system to performoperations, the operations comprising: perturbing image data to generateadversarial image data configured to cause a machine-learned model tomisrecognize content depicted in the image data; providing, to a userdevice, the adversarial image data for display on the user device;receiving, from the user device, user feedback indicating recognition bythe user of the content; and training the machine-learned model based atleast in part on the adversarial image data.
 30. The computing system ofclaim 29, wherein the user feedback comprises a class assigned to thecontent by the user.
 31. The computing system of claim 29, wherein thecontent corresponds to a class, and wherein the user feedback indicatesrecognition of the class by the user.
 32. The computing system of claim29, wherein the user feedback indicates that the content is readable.33. The computing system of claim 29, wherein perturbing the image datacomprises: determining a direction of a gradient of a loss function thatevaluates an output provided by the machine-learned model when given atleast a portion of the image data as an input; perturbing the image datain a second direction that is opposite to the direction of the gradientof the loss function; and generating, based at least in part on theperturbed image data, the adversarial image data.
 34. The computingsystem of claim 29, wherein the image data is perturbed according to oneor more updated perturbation parameters, wherein the one or more updatedperturbation parameters were obtained by: perturbing first image data togenerate first adversarial image data configured to cause themachine-learned model to misrecognize first content depicted in thefirst image data, wherein the first image data is perturbed according toone or more first perturbation parameters; providing, to a first userdevice, the first adversarial image data for display on the first userdevice; receiving, from the first user device, first user feedbackindicating recognition failure by the first of the first content; andupdating, automatically, the one or more first perturbation parametersto obtain the one or more updated perturbation parameters, the one ormore updated perturbation parameters configured to decrease a magnitudeof the perturbation.
 35. The computing system of claim 34, wherein theone or more first perturbation parameters comprise a step sizehyperparameter that controls a magnitude of a step performed during theperturbation.
 36. The computing system of claim 35, wherein updating theone or more first perturbation parameters comprises decreasing,automatically by the one or more computing devices, the step size. 37.One or more non-transitory computer-readable media storing instructionsthat are executable by one or more processors to cause a computingsystem to perform operations, the operations comprising: perturbingimage data to generate adversarial image data configured to cause amachine-learned model to misrecognize content depicted in the imagedata; providing, to a user device, the adversarial image data fordisplay on the user device; receiving, from the user device, userfeedback indicating recognition by the user of the content; and trainingthe machine-learned model based at least in part on the adversarialimage data.
 38. The one or more non-transitory computer-readable mediaof claim 37, wherein the user feedback comprises a class assigned to thecontent by the user.
 39. The one or more non-transitorycomputer-readable media of claim 37, wherein the content corresponds toa class, and wherein the user feedback indicates recognition of theclass by the user.
 40. The one or more non-transitory computer-readablemedia of claim 37, wherein the user feedback indicates that the contentis readable.